backup: KEYMAP.jsonl writer/reader and MANIFEST.json schema (Phase 0a)#712
backup: KEYMAP.jsonl writer/reader and MANIFEST.json schema (Phase 0a)#712bootjp wants to merge 11 commits intofeat/backup-phase0a-filenamefrom
Conversation
Builds on PR #711 (filename encoding). Adds two more foundation pieces of the Phase 0 logical-backup decoder. KEYMAP.jsonl (internal/backup/keymap.go) - Append-only JSONL stream of {encoded, original (b64url), kind} records. - Records exist only for entries whose original bytes are NOT recoverable from the encoded filename alone: - KindSHAFallback (segments rendered as <sha32>__<truncated>) - KindS3LeafData (S3 path collisions renamed to .elastickv-leaf-data) - KindMetaCollision (user object key ending in .elastickv-meta.json) - KeymapWriter: streaming append, json encoder configured to skip HTML escapes so user-key bytes round-trip cleanly. Refuses empty encoded or kind so producer bugs surface loudly. Count() exposed for the "omit empty file" decision. - KeymapReader: line-by-line scanner with bounded buffer (1 MiB); blank lines surface as ErrInvalidKeymapRecord rather than being silently skipped so truncated dumps are recognised. - LoadKeymap: convenience helper that materialises the file as a map (last-wins on duplicates). MANIFEST.json (internal/backup/manifest.go) - Manifest, Source, Live, Adapters, Adapter, Exclusions structs matching the schema in docs/design/2026_04_29_proposed_snapshot_logical_decoder.md. - CurrentFormatVersion = 1; ReadManifest refuses format_version > current and format_version == 0 (ErrUnsupportedFormatVersion). - Phase discriminator constants for Phase 0 ("phase0-snapshot-decode") and Phase 1 ("phase1-live-pinned"); Phase 0 manifests must not set Live, Phase 1 must not set Source -- both validated at write and read time. - DisallowUnknownFields on read so format drift surfaces loudly. - Pretty-printed output (2-space indent, no HTML escapes) since MANIFEST.json is operator-facing. - NewPhase0SnapshotManifest seeds the policy fields with the documented defaults so callers can focus on populating per-dump metadata. Tests cover round-trip, sticky-error semantics, last-wins dedup, HTML-escape suppression, future-version refusal, unknown-field refusal, unknown-phase refusal, and the cross-phase Source/Live exclusion rules.
|
Important Review skippedAuto reviews are disabled on base/target branches other than the default branch. Please check the settings in the CodeRabbit UI or the ⚙️ Run configurationConfiguration used: defaults Review profile: CHILL Plan: Pro Run ID: You can disable this status message by setting the Use the checkbox below for a quick retry:
✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Code Review
This pull request introduces the metadata infrastructure for a backup system, implementing the MANIFEST.json and KEYMAP.jsonl file formats. The manifest provides a central record of backup versions, phases, and adapter scopes, while the keymap facilitates the reversal of encoded filenames to their original keys. Feedback highlights an opportunity to improve the manifest's representation of adapter scopes using pointers to distinguish between excluded and empty states, and suggests renaming a misleading test case in the keymap suite.
| type Adapters struct { | ||
| DynamoDB Adapter `json:"dynamodb"` | ||
| S3 Adapter `json:"s3"` | ||
| Redis Adapter `json:"redis"` | ||
| SQS Adapter `json:"sqs"` | ||
| } |
There was a problem hiding this comment.
The documentation for the Adapters struct describes a distinction between nil and empty values to indicate whether an adapter was included in the scope filter. However, the current implementation uses non-pointer Adapter structs, which means they are always present in the JSON. This can lead to state inconsistencies where the 'not included' state is silently dropped or misrepresented during serialization. Following the rule to avoid silently dropping entries during serialization or normalization, consider using pointers for the fields in the Adapters struct and applying omitempty to them to ensure the intended state is preserved on disk.
type Adapters struct {
DynamoDB *Adapter json:"dynamodb,omitempty"
S3 *Adapter json:"s3,omitempty"
Redis *Adapter json:"redis,omitempty"
SQS *Adapter json:"sqs,omitempty"
}References
- Avoid silently dropping entries during serialization or normalization. If placeholder or incomplete entries are valid in some contexts, provide a specific, more lenient handling path for them to ensure they are preserved on disk and avoid state inconsistencies.
| } | ||
| } | ||
|
|
||
| func TestKeymapReader_AcceptsBlankLinesByPolicy(t *testing.T) { |
There was a problem hiding this comment.
The test name TestKeymapReader_AcceptsBlankLinesByPolicy is misleading because the test actually asserts that blank lines are rejected with ErrInvalidKeymapRecord. This matches the intended strict JSONL behavior described in the comments. The test name should be updated to reflect that blank lines are rejected.
| func TestKeymapReader_AcceptsBlankLinesByPolicy(t *testing.T) { | |
| func TestKeymapReader_RejectsBlankLines(t *testing.T) { |
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: a0d92449a4
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| if err := json.Unmarshal(line, &rec); err != nil { | ||
| r.err = errors.Wrap(ErrInvalidKeymapRecord, err.Error()) | ||
| return KeymapRecord{}, false, r.err | ||
| } | ||
| if rec.Encoded == "" || rec.Kind == "" { | ||
| r.err = errors.Wrap(ErrInvalidKeymapRecord, "missing encoded or kind") | ||
| return KeymapRecord{}, false, r.err | ||
| } | ||
| return rec, true, nil |
There was a problem hiding this comment.
Reject malformed base64 in KEYMAP records
KeymapReader.Next documents malformed base64 as an invalid record, but it currently only unmarshals JSON and checks encoded/kind, so records with invalid original are accepted and cached by LoadKeymap until a later Original() call fails. This defers corruption detection and can make a damaged KEYMAP appear valid during initial ingest.
Useful? React with 👍 / 👎.
| dec := json.NewDecoder(r) | ||
| dec.DisallowUnknownFields() // surface format drift loudly | ||
| if err := dec.Decode(&m); err != nil { | ||
| return Manifest{}, errors.Wrap(ErrInvalidManifest, err.Error()) |
There was a problem hiding this comment.
Reject trailing bytes after MANIFEST JSON object
ReadManifest decodes exactly one JSON value and returns success without checking for EOF, so inputs like {"...":...}{"extra":1} (or other trailing junk) are accepted. That weakens the manifest's strictness guarantees and can hide file corruption or concatenation errors that should fail validation.
Useful? React with 👍 / 👎.
…tickv into feat/backup-phase0a-keymap-manifest
- Adapters fields are now *Adapter pointers (Gemini #98). Nil -> excluded, non-nil empty -> included-with-no-scopes, non-nil populated -> normal. Previous non-pointer struct collapsed both "excluded" and "included-empty" into the same on-disk shape; gemini's reference (don't silently drop entries during serialization) was the right call. Test TestAdaptersStruct_NilVsEmptyDistinguishedOnDisk covers it. - KeymapReader.Next now validates the base64-encoded `original` field at parse time (Codex P1 #179). Previously the JSON parsed fine and the bad base64 surfaced lazily on Original() — that defers corruption detection past initial ingest. Test TestKeymapReader_RejectsMalformedBase64AtParseTime locks in the parse-time validation. - ReadManifest now rejects trailing bytes via dec.More() (Codex P2 #194). Two manifests concatenated, garbage tails, partial-write artifacts: all surface as ErrInvalidManifest. Tests TestReadManifest_RejectsTrailingBytes and TestReadManifest_RejectsTrailingNonWhitespace. - Test name TestKeymapReader_AcceptsBlankLinesByPolicy renamed to TestKeymapReader_RejectsBlankLines (Gemini #160). The test was already asserting rejection; only the name was misleading.
|
Review feedback addressed in Gemini MEDIUM #98 — Adapters nil-vs-empty must be distinguishable on disk Codex P1 #179 — Validate base64 at parse time Codex P2 #194 — Reject trailing bytes after MANIFEST JSON Gemini MEDIUM #160 — Test name fix @claude review |
|
Claude finished @bootjp's task in 2m 38s —— View job Review of
|
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: c3d1e6fdf3
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| case PhasePhase1LivePinned: | ||
| if m.Source != nil { | ||
| return errors.Wrap(ErrInvalidManifest, "phase1 must not set source") |
There was a problem hiding this comment.
Require live metadata for phase1 manifests
Phase 1 manifests are accepted even when live is completely absent, because validation only rejects source in the phase1 branch. That allows WriteManifest/ReadManifest to succeed for a phase1-live-pinned dump with no pinned read_ts, which breaks the manifest’s ability to describe the consistency point and can cause downstream restore/audit logic to treat an incomplete manifest as valid.
Useful? React with 👍 / 👎.
…to feat/backup-phase0a-keymap-manifest
, round 2) Codex P1 #295: validation only rejected source on phase1 manifests but accepted a phase1 dump with live entirely absent. A phase1 dump's whole point is the cluster-wide read_ts pin recorded under Live; a manifest that omits Live cannot describe its consistency point and downstream restore/audit logic must not silently treat it as valid. validatePhaseSpecific now requires Live != nil AND Live.ReadTS != 0 for phase1 dumps. Tests: TestManifest_Phase1MustSetLive, TestManifest_Phase1RejectsZeroReadTS, TestManifest_Phase1WithLiveAndNonZeroReadTSIsValid.
|
Codex P1 #295 addressed in @claude review |
|
Claude finished @bootjp's task in 2m 29s —— View job Review of
|
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: e4cd7d8ea7
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| func ReadManifest(r io.Reader) (Manifest, error) { | ||
| var m Manifest | ||
| dec := json.NewDecoder(r) | ||
| dec.DisallowUnknownFields() // surface format drift loudly |
There was a problem hiding this comment.
Allow unknown MANIFEST fields for same-major evolution
ReadManifest rejects any unknown JSON key via DisallowUnknownFields, which makes older readers fail as soon as a newer producer adds an optional field while keeping format_version at the same major value. That breaks the stated same-major compatibility model (optional field additions) and turns otherwise valid manifests into hard read failures during mixed-version operation.
Useful? React with 👍 / 👎.
, round 3) Codex P1 #205 (round 2): DisallowUnknownFields broke the documented same-major minor-evolution path. A newer producer that adds an optional field at the same format_version turns older readers into hard read failures during mixed-version operation, even though both nodes are on the "same major" by contract. The format_version field IS the breaking-change signal. Major bumps are gated by ErrUnsupportedFormatVersion; same-major minor evolution must silently tolerate optional new fields. Removed dec.DisallowUnknownFields() from ReadManifest. Replaced TestReadManifest_RejectsUnknownFields with TestReadManifest_AcceptsUnknownFieldsForSameMajorMinorEvolution. The trailing-bytes guard (dec.More() check) is preserved -- it catches concatenation bugs which is a different concern from optional-field tolerance.
|
Codex P1 #205 (round 2) addressed in
Test @claude review |
|
Claude finished @bootjp's task in 1m 50s —— View job Review of
|
| Item | File | Status |
|---|---|---|
| Stale "whitespace" + "io.Discard" in trailing-bytes comment | manifest.go:219-225 |
Flagged rounds 1+2 |
(Codex P1 #295) issue ref in validatePhaseSpecific |
manifest.go:310-311 |
Flagged round 2 |
(Codex P1, follow-up) in new comment |
manifest.go:206 |
New this round |
Codex P1 #205 (round 2) in test comment |
manifest_test.go:148-150 |
New this round |
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 6f231fe790
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| Adapters Adapters `json:"adapters"` | ||
| Exclusions Exclusions `json:"exclusions"` | ||
| ChecksumAlgorithm string `json:"checksum_algorithm"` |
There was a problem hiding this comment.
Reject manifests that omit adapters/exclusions
ReadManifest accepts JSON where adapters and/or exclusions are missing, because these are decoded into non-pointer struct fields and validate() only checks scalar policy fields. In that case the sections silently become zero values and the manifest is treated as valid, which can hide truncated/corrupted MANIFEST files and drop provenance about what scopes/filters were actually used. This should fail validation instead of defaulting.
Useful? React with 👍 / 👎.
Codex P2 #146 round 3: ReadManifest accepted manifests where the adapters and/or exclusions sections were omitted entirely from the JSON, because they were decoded into non-pointer struct fields and validate() only checked scalar policy fields. The empty zero-value sections were treated as valid, hiding truncated or corrupted MANIFEST files and dropping provenance about which scopes/filters the producer used. Manifest.Adapters and Manifest.Exclusions are now *Adapters / *Exclusions pointer types. validateRequiredFields rejects nil for either with ErrInvalidManifest. NewPhase0SnapshotManifest seeds them as &Adapters{} / &Exclusions{} so producers continue to satisfy the contract by default; populating individual scopes reaches in via the now-non-nil pointer. Tests TestReadManifest_RejectsMissingAdapters and TestReadManifest_RejectsMissingExclusions cover the new validation path.
|
Codex P2 #146 (round 3) addressed in
Tests: @claude review |
|
Claude finished @bootjp's task in 1m 47s —— View job Review of
|
| Item | Location | Rounds flagged |
|---|---|---|
| Stale "whitespace" + "io.Discard" in trailing-bytes comment | manifest.go:229-235 |
1, 2, 3, 4 |
(Codex P1 #295) in validatePhaseSpecific |
manifest.go:328-330 |
2, 3, 4 |
(Codex P1, follow-up) in ReadManifest |
manifest.go:216 |
2, 3, 4 |
Codex P1 #205 (round 2) in test |
manifest_test.go:148-149 |
3, 4 |
The stale-comment fix has been written out in full in the round-1 and round-2 reviews. Repeating the suggested replacement for manifest.go:229-235:
// MANIFEST.json is exactly one JSON object. Any non-whitespace
// decodable content after it (a second object, junk) points at
// concatenation bugs or partial-write corruption. Trailing
// whitespace and newlines are benign and not flagged.Fix all 7 issue references + stale comment in one pass →
Summary
The Codex P2 #146 round 3 fix in 73641330 is structurally correct. The only remaining work is comment cleanup — the fix logic and all new tests are sound. There are now 7 issue references across the two files, accumulated over 4 rounds, plus the stale io.Discard/whitespace comment that has been flagged since round 1. Addressing all of them in a single pass would close this PR cleanly.
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 736413302f
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| if rec.Encoded == "" || rec.Kind == "" { | ||
| r.err = errors.Wrap(ErrInvalidKeymapRecord, "missing encoded or kind") | ||
| return KeymapRecord{}, false, r.err | ||
| } | ||
| if _, err := base64.RawURLEncoding.DecodeString(rec.OriginalB64); err != nil { |
There was a problem hiding this comment.
Require
original field in KEYMAP records
KeymapReader.Next only validates encoded and kind, then base64-decodes OriginalB64; because DecodeString("") succeeds, a record missing original is currently accepted as if the original key were empty bytes. In a corrupted/truncated KEYMAP line this silently rewrites the mapping for that encoded segment and can break exact key recovery for SHA-fallback or collision-renamed entries instead of surfacing ErrInvalidKeymapRecord immediately.
Useful? React with 👍 / 👎.
| if err := dec.Decode(&m); err != nil { | ||
| return Manifest{}, errors.Wrap(ErrInvalidManifest, err.Error()) |
There was a problem hiding this comment.
Check manifest version before strict struct decoding
ReadManifest decodes directly into Manifest before applying the format_version gate, so a newer major manifest that changes a known field type (e.g., phase no longer a string) returns ErrInvalidManifest instead of ErrUnsupportedFormatVersion. That breaks the advertised version-branching contract for callers using errors.Is(err, ErrUnsupportedFormatVersion) to distinguish upgrade-required inputs from malformed files.
Useful? React with 👍 / 👎.
Codex P2 round 5 raised two correctness issues:
1. KeymapReader.Next accepted records missing the `original` field
because base64.RawURLEncoding.DecodeString("") succeeds. A
corrupted line dropping `original` silently rewrote the
encoded->original mapping to empty bytes, breaking exact key
recovery for SHA-fallback / collision-renamed entries. Now the
reader decodes into a presence-aware map first and rejects
records missing any of {encoded, original, kind}. Explicit empty
string remains valid.
2. ReadManifest decoded into Manifest before applying the
format_version gate. A future-major manifest that also changed
the JSON type of a known field (e.g. `phase` string -> int) was
surfacing as ErrInvalidManifest instead of
ErrUnsupportedFormatVersion, breaking the advertised
version-branching contract. Now ReadManifest probes
format_version with a relaxed shape first, branches on the
result, and only runs the strict struct decode on a
known-supported version.
Tests: TestKeymapReader_RejectsMissingOriginalField,
TestKeymapReader_AcceptsExplicitEmptyOriginal,
TestReadManifest_FutureMajorVersionTakesPrecedenceOverTypeMismatch.
|
@claude review Round 5 fixes for Codex P2 findings on commit 7364133:
Verified locally: |
|
Claude finished @bootjp's task in 3m 3s —— View job Review of Round 5 fixes (
|
| Location | Issue | Rounds flagged |
|---|---|---|
manifest.go:260-266 |
Stale "whitespace-only padding" + "io.Discard" in trailing-bytes comment | 1, 2, 3, 4, 5 |
manifest.go:149 |
(Codex P2 #146 round 3). in struct field comment |
4, 5 |
manifest.go:247 |
(Codex P1, follow-up) in ReadManifest comment |
3, 4, 5 |
manifest.go:302-304 |
(Codex P2 #146 round 3) in validateRequiredFields |
4, 5 |
manifest.go:350-353 |
(Codex P1 #295) in validatePhaseSpecific |
2, 3, 4, 5 |
manifest_test.go:58 |
Codex P1 #295 caught in TestManifest_Phase1MustSetLive |
2, 3, 4, 5 |
manifest_test.go:178-179 |
Codex P1 #205 (round 2) in TestReadManifest_AcceptsUnknownFields |
3, 4, 5 |
manifest_test.go:345-348 |
Codex P2 #146 round 3. in TestReadManifest_RejectsMissingAdapters |
4, 5 |
The corrected text for the stale manifest.go:260-266 comment (unchanged since round 1):
// MANIFEST.json is exactly one JSON object. Any non-whitespace
// decodable content after it (a second object, junk) points at
// concatenation bugs or partial-write corruption. Trailing
// whitespace and newlines are benign and not flagged.Fix all accumulated comment issues in one pass →
Summary
The two Round 5 fixes in 2cd58a9 are correct. The field-presence check in decodeKeymapLine and the version-first probe in ReadManifest both address the stated issues accurately, and all new tests target the right regression paths. The only remaining work is comment cleanup: 4 new issue references added this round, plus 8 accumulated from prior rounds, plus the stale io.Discard/whitespace comment that has been flagged since round 1. A single cleanup pass across both files would close this out.
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 2cd58a932d
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| // size on the source store). 1 MiB per line is generous; if a record | ||
| // genuinely exceeds it the reader returns a typed error rather than | ||
| // silently truncating. | ||
| const keymapBufSizeReader = 1 << 20 |
There was a problem hiding this comment.
Raise KEYMAP scanner limit above max keymap record size
keymapBufSizeReader is capped at 1 MiB, but a valid record can be larger than that: snapshot keys are allowed up to 1 MiB (store/mvcc_store.go), and original is stored as base64url, which expands a 1 MiB key to ~1.33 MiB before JSON overhead. In this case KeymapReader.Next fails with bufio.Scanner: token too long, so backups containing large-but-valid keys cannot be read back via LoadKeymap/KeymapReader.
Useful? React with 👍 / 👎.
…d 6) Codex P1 round 6 (commit 2cd58a9): The source store (store/mvcc_store.go) caps a single snapshot key at 1 MiB, but `keymapBufSizeReader` was also 1 MiB. KEYMAP records carry the key as base64url, which expands ~4/3 — so a maximum-sized valid key produced a ~1.33 MiB record that bufio.Scanner refused with "token too long", and the dump could not be loaded back via `LoadKeymap` / `KeymapReader`. Bump to 4 MiB. That carries 2× margin over the theoretical worst case (1 MiB key + base64 + JSON overhead) while still bounding pathological lines, and matches the doubling cadence we'd want if the upstream key cap were ever raised. Test: TestKeymapReader_AcceptsMaxSizedOriginal exercises a maxSnapshotKeySize-sized record round-trip through the writer and reader.
|
@claude review Round 6 fix for Codex P1 on commit 2cd58a9: The source store caps a single snapshot key at 1 MiB ( Test: |
|
Claude finished @bootjp's task in 2m 11s —— View job Review of Round 6 fix (
|
| Location | Issue | Rounds flagged |
|---|---|---|
manifest.go:149 |
(Codex P2 #146 round 3). in struct field comment |
4–6 |
manifest.go:222 |
See Codex P2, round 5. in ReadManifest |
5–6 |
manifest.go:247 |
(Codex P1, follow-up) in ReadManifest |
3–6 |
manifest.go:260-266 |
Stale "whitespace-only padding" + "io.Discard" in trailing-bytes comment | 1–6 |
manifest.go:302-304 |
(Codex P2 #146 round 3) in validateRequiredFields |
4–6 |
manifest.go:350-353 |
(Codex P1 #295) in validatePhaseSpecific |
2–6 |
manifest_test.go:58 |
-- the gap Codex P1 #295 caught. |
2–6 |
manifest_test.go:178-179 |
Codex P1 #205 (round 2) |
3–6 |
The stale trailing-bytes comment (manifest.go:260-266) replacement (unchanged since round 1):
// MANIFEST.json is exactly one JSON object. Any non-whitespace
// decodable content after it (a second object, junk) points at
// concatenation bugs or partial-write corruption. Trailing
// whitespace and newlines are benign and not flagged.Fix all accumulated comment issues in one pass →
Summary
The Round 6 buffer-cap fix in 0afb5e3 is correct — the size increase is well-reasoned and the regression test covers the exact failure mode. Three items to clean up: replace the //nolint:mnd with a named constant, drop the two new issue references, and address the 8 accumulated comment issues in manifest.go/manifest_test.go that have been flagged since rounds 1–5.
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 0afb5e3a54
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| if _, err := base64.RawURLEncoding.DecodeString(rec.OriginalB64); err != nil { | ||
| return KeymapRecord{}, errors.Wrap(ErrInvalidKeymapRecord, err.Error()) |
There was a problem hiding this comment.
Reject null
original in KEYMAP records
decodeKeymapLine only checks that the original key exists and that base64.DecodeString(rec.OriginalB64) succeeds. With Go's JSON semantics, "original": null unmarshals into an empty string, and decoding "" succeeds, so an invalid/corrupted record is accepted as an empty original key. This can silently rewrite mappings during LoadKeymap and break exact key recovery for SHA-fallback or collision-renamed entries instead of surfacing ErrInvalidKeymapRecord at read time.
Useful? React with 👍 / 👎.
| if m.Exclusions == nil { | ||
| return errors.Wrap(ErrInvalidManifest, "exclusions section missing") | ||
| } |
There was a problem hiding this comment.
Enforce presence of individual exclusions flags
Validation only checks that the exclusions object exists, not that each required flag is present and typed. Because these fields are plain bool, JSON inputs that omit fields (or set them to null) silently become false, so truncated/corrupted manifests can be accepted with altered exclusion semantics. For example, an omitted preserve_sqs_visibility currently decodes as false and passes validation, losing provenance about how the dump was produced.
Useful? React with 👍 / 👎.
… round 7) Codex round 7 raised two issues on commit 0afb5e3: 1. P1: KEYMAP records with `"original": null` were silently accepted as empty bytes. The presence-aware decode added in round 5 only checked field-name presence; a null literal round-trips through json.Unmarshal into "" for a string field, and base64.RawURLEncoding.DecodeString("") succeeds. Add an explicit json.RawMessage compare against the literal `null` token so corrupted/truncated records fail at parse time instead of rewriting mappings to empty bytes. Apply the same guard to `encoded` and `kind` for symmetry. 2. P2: Exclusions sub-fields (booleans) defaulted to `false` on absence. validateRequiredFields only checked Exclusions != nil; a manifest that omitted `preserve_sqs_visibility` (or any other flag) silently passed with altered exclusion semantics, losing the producer-side provenance the section is meant to capture. Add `validateExclusionsFieldsPresent` that walks the raw JSON payload and rejects manifests missing or null'ing any of the four flags. Tests: - TestKeymapReader_RejectsExplicitNullField (3 sub-cases) - TestReadManifest_RejectsMissingExclusionFlag (5 sub-cases including explicit-null)
|
@claude review Round 7 fixes for Codex round 7 on commit 0afb5e3 (commit
|
|
Claude finished @bootjp's task in 3m 17s —— View job Review of Round 7 fixes (
|
| Location | Issue | First flagged |
|---|---|---|
manifest.go:149 |
Codex P2 #146 (round 3). in struct field comment |
Round 4 |
manifest.go:222 |
See Codex P2, round 5. in ReadManifest comment |
Round 5 |
manifest.go:247 |
(Codex P1, follow-up) in same block |
Round 3 |
manifest.go:260-266 |
Stale "whitespace-only padding" + "io.Discard" in trailing-bytes comment |
Round 1 |
manifest.go:353-355 |
(Codex P2 #146 round 3) in validateRequiredFields |
Round 4 |
manifest.go:400-404 |
(Codex P1 #295) in validatePhaseSpecific |
Round 2 |
manifest_test.go:58 |
-- the gap Codex P1 #295 caught. |
Round 2 |
manifest_test.go:177-179 |
Codex P1 #205 (round 2) in TestReadManifest_AcceptsUnknownFields |
Round 3 |
keymap_test.go:286 |
//nolint:mnd (use named constant) |
Round 6 |
Corrected text for manifest.go:260-266 (unchanged since round 1):
// MANIFEST.json is exactly one JSON object. Any non-whitespace
// decodable content after it (a second object, junk) points at
// concatenation bugs or partial-write corruption. Trailing
// whitespace and newlines are benign and not flagged.Fix all 13 accumulated comment items in one pass →
Summary
The Round 7 fixes in 6b2dea00 are both correct — null-field detection in decodeKeymapLine and presence validation in validateExclusionsFieldsPresent are sound approaches, and all new tests cover the stated failure modes. The only logic-level note is the exclusionsRequiredFields maintenance hazard (worth a PR description callout). Everything else remaining is comment cleanup: 4 new issue references this round plus 9 accumulated from prior rounds that have now been flagged for at least 2 rounds each.
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 6b2dea007f
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| if probe.FormatVersion == 0 { | ||
| return Manifest{}, errors.Wrapf(ErrUnsupportedFormatVersion, |
There was a problem hiding this comment.
Treat missing format_version as invalid manifest
The version probe maps format_version into a uint32, so when the field is absent it silently becomes 0 and the code returns ErrUnsupportedFormatVersion. That misclassifies malformed/truncated manifests as an upgrade-needed case, which breaks the documented error-branching contract for callers that distinguish unsupported formats from invalid files.
Useful? React with 👍 / 👎.
| if m.FormatVersion == 0 { | ||
| return errors.Wrap(ErrInvalidManifest, "format_version is zero") |
There was a problem hiding this comment.
Reject future format_version when writing manifests
WriteManifest relies on validateRequiredFields, but that check only rejects zero and does not enforce FormatVersion <= CurrentFormatVersion. As a result, a caller can write a manifest with a future version that this same package later rejects in ReadManifest, producing self-incompatible backup metadata from a simple field mutation.
Useful? React with 👍 / 👎.
Codex P2 round 8 (commit 6b2dea0): 1. Missing `format_version` was misclassifying corrupted/truncated manifests as ErrUnsupportedFormatVersion. The version probe decoded straight into `uint32`, so an absent field collapsed silently to 0 and the gate said "upgrade required". A truncated manifest belongs in ErrInvalidManifest so callers branching on errors.Is can distinguish "this file is broken" from "newer producer". Add a presence-and-null probe on the raw JSON `format_version` token before the typed unmarshal. Extracted the whole probe into `probeManifestFormatVersion` to keep ReadManifest's cyclomatic complexity under the project's linter ceiling. 2. `WriteManifest` did not enforce `FormatVersion <= CurrentFormatVersion`. A caller mutating `m.FormatVersion = CurrentFormatVersion + 1` could write a manifest that the same package's ReadManifest then rejects — self-incompatible backup metadata for a one-line field mutation. validateRequiredFields now refuses that case. Tests: - TestReadManifest_RejectsMissingFormatVersion (also asserts the error is *not* ErrUnsupportedFormatVersion). - TestReadManifest_RejectsNullFormatVersion (mirror for explicit JSON `null`). - TestWriteManifest_RejectsFutureFormatVersion (round-trip self-compat check).
|
@claude review Round 8 fixes for Codex P2 round 8 on commit 6b2dea0 (commit
|
|
Claude finished @bootjp's task in 2m 48s —— View job Review of Round 8 fixes (
|
| Location | Issue | First flagged |
|---|---|---|
manifest.go:149 |
Codex P2 #146 (round 3). in Adapters struct field comment |
Round 4 |
manifest.go:222 |
See Codex P2, round 5. in ReadManifest two-phase comment |
Round 5 |
manifest.go:234 |
(Codex P1, follow-up) in ReadManifest unknown-fields comment |
Round 3 |
manifest.go:247-253 |
Stale "even whitespace-only padding" + "io.Discard" in trailing-bytes comment |
Round 1 |
manifest.go:313 |
(Codex P2 round 7) in validateExclusionsFieldsPresent docstring |
Round 7 |
manifest.go:393-395 |
(Codex P2 #146 round 3) in validateRequiredFields |
Round 4 |
manifest.go:443-444 |
(Codex P1 #295) in validatePhaseSpecific |
Round 2 |
manifest_test.go:58 |
-- the gap Codex P1 #295 caught. |
Round 2 |
manifest_test.go:248-250 |
Codex P1 #205 (round 2) caught the earlier... |
Round 3 |
manifest_test.go:362-363 |
-- Codex P2 #194. in TestReadManifest_RejectsTrailingBytes |
Round 1 |
keymap.go:57-58 |
Codex P1 round 6 (commit 2cd58a93) in keymapBufSizeReader comment |
Round 6 |
keymap.go:201 |
Codex P2 round 5 + P1 round 7-follow-up. in decodeKeymapLine |
Round 7 |
keymap.go:205 |
Codex P1 #179. in decodeKeymapLine |
Round 7 |
keymap_test.go:212 |
— Codex P1 #179. in TestKeymapReader_RejectsMalformedBase64AtParseTime |
Round 5 |
keymap_test.go:221-222 |
regression for Codex P1 round 7-follow-up: in TestKeymapReader_RejectsExplicitNullField |
Round 7 |
keymap_test.go:249 |
exercises Codex P2 round 5: in TestKeymapReader_RejectsMissingOriginalField |
Round 7 |
keymap_test.go:273-274 |
regression for Codex P1 round 6: in TestKeymapReader_AcceptsMaxSizedOriginal |
Round 6 |
keymap_test.go:286 |
//nolint:mnd (use named constant testBytePrimeSpread = 251) |
Round 6 |
The stale trailing-bytes comment (manifest.go:247-253) has been flagged every round since round 1. Corrected text (unchanged from prior rounds):
// MANIFEST.json is exactly one JSON object. Any non-whitespace
// decodable content after it (a second object, junk) points at
// concatenation bugs or partial-write corruption. Trailing
// whitespace and newlines are benign and not flagged.Fix all 22 accumulated comment issues in one pass →
Summary
The Round 8 fixes in f13cd1bf are both correct. probeManifestFormatVersion properly distinguishes the three distinct failure modes (malformed, reserved-zero, upgrade-required) and the write-side future-version guard closes the self-incompatibility gap. All four new tests are well-targeted. The only remaining work is comment cleanup — 4 new issue references this round plus 14 accumulated from prior rounds, plus the stale io.Discard/whitespace comment that has been open since round 1. A single pass across all 4 files would clear the board.
Summary
Stacked on top of #711 (filename encoding). Adds two more foundation pieces of the Phase 0 logical-backup decoder.
KEYMAP.jsonl (
internal/backup/keymap.go){encoded, original (b64url), kind}records.KindSHAFallback— segments rendered as<sha32>__<truncated>KindS3LeafData— S3 path collisions renamed to.elastickv-leaf-dataKindMetaCollision— user object key ending in.elastickv-meta.jsonKeymapWriter: streaming append, JSON encoder configured to skip HTML escapes so user-key bytes round-trip cleanly. Refuses emptyencodedorkindso producer bugs surface loudly.Count()exposed for the "omit empty file" decision.KeymapReader: line-by-line scanner with bounded buffer (1 MiB); blank lines surface asErrInvalidKeymapRecordrather than being silently skipped so truncated dumps are recognised.LoadKeymap: convenience helper that materialises the file as a map (last-wins on duplicates).MANIFEST.json (
internal/backup/manifest.go)docs/design/2026_04_29_proposed_snapshot_logical_decoder.md.CurrentFormatVersion = 1;ReadManifestrefusesformat_version > currentandformat_version == 0(ErrUnsupportedFormatVersion).Live, Phase 1 must not setSource— both validated at write and read time.DisallowUnknownFieldson read so format drift surfaces loudly.MANIFEST.jsonis operator-facing.NewPhase0SnapshotManifestseeds policy fields with the documented defaults.Test plan
go test -race ./internal/backup/...— pass.golangci-lint run ./internal/backup/...— clean.Source/Liveexclusion.Self-review
KeymapReaderreturns sticky errors so partial reads cannot be silently treated as success.KeymapWriter/KeymapReaderare not goroutine-safe (per-scope use); manifest helpers are pure.-raceclean.bufio.Writer(64 KiB) for the JSONL stream; bounded scanner buffer (1 MiB) on read.DisallowUnknownFields+ format-version gate prevent silent drift. The phase discriminator's structural rules are enforced symmetrically at write and read.Stacking
Base branch is
feat/backup-phase0a-filename(PR #711). When that lands, this PR's base will switch tomainautomatically.